Auto-parallelisation of Sieve C++ Programs
نویسندگان
چکیده
We describe an approach to automatic parallelisation of programs written in Sieve C++ (Codeplay’s C++ extension), using the Sieve compiler and runtime system. In Sieve C++, the programmer encloses a performance-critical region of code in a sieve block, thereby instructing the compiler to delay sideeffects until the end of the block. The Sieve system partitions code inside a sieve block into independent fragments and speculatively distributes them among multiple cores. We present implementation details and experimental results for the Sieve system on the Cell BE processor.
منابع مشابه
Delayed Side-Effects Ease Multi-core Programming
Computer systems are increasingly parallel and heterogeneous, while programs are still largely written in sequential languages. The obvious suggestion that the compiler should automatically distribute a sequential program across the system usually fails in practice because of the complexity of dependence analysis in the presence of aliasing. We introduce the sieve language construct which facil...
متن کاملStrict and Relaxed Sieving for Multi-Core Programming
In Codeplay’s Sieve C++, the programmer can place code inside a “sieve block” thereby instructing the compiler to delay writes to global memory and apply them in order on exit from the block. The semantics of sieve blocks makes code more amenable to automatic parallelisation. However, strictly queueing writes until the end of a sieve block incurs overheads and is typically unnecessary. If the p...
متن کاملParallelisation of the Model-based Iterative Reconstruction Algorithm Dira.
New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelisation of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the pa...
متن کاملInfluence of the Sparse Matrix Structure on Automatic Parallelisation Efficiency
The simulated models and requirements of engineering programs like computational fluids dynamics and structural mechanics grow more rapidly than single processor performance. Automatic parallelisation seem to be the obvious approach for huge and historic packages like PERMAS. In this paper we evaluate how preparatory steps on the big input matrices can improve the performance of the parallelisa...
متن کاملTowards Automatic Parallelisation for Multi-Processor DSPs
This paper describes a preliminary compiler based approach to achieving high performance DSP applications by automatically mapping C programs to multi-processor DSP systems. DSP programs typically contain pointer based memory accesses making automatic parallelisation difficult. This paper presents a new method to convert a restricted class of pointer-based memory accesses into array accesses wi...
متن کامل